Skip to content

[RFC] Add a Mutex trait #377

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 21 commits into from
Nov 19, 2019
Merged

[RFC] Add a Mutex trait #377

merged 21 commits into from
Nov 19, 2019

Conversation

korken89
Copy link
Contributor

@korken89 korken89 commented Sep 10, 2019

@korken89 korken89 requested review from dylanmckay, jcsoo and a team as code owners September 10, 2019 07:34
@korken89 korken89 changed the title Add a Mutex trait [RFC] Add a Mutex trait Sep 10, 2019
@korken89 korken89 added the RFC label Sep 10, 2019
Copy link
Member

@andre-richter andre-richter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Writing from mobile, having limited means to look stuff up, so please correct if wrong.
IIRC, std::mutex lock() creates a RAII guard that one can dereference to get access to the data.

This trait's lock function expects a closure. I personally tend to like closures more because they give a better visual indication of the critical section, but wanted to ask nonetheless if a guard approach was considered to stay closer to std::?
The guard impl could be done by us, trait would have to come from the user.

@roblabla
Copy link

roblabla commented Sep 10, 2019

I prefer a closure-based locking solution. There are certain kinds of locks that are not implementable by a guard because the destructor can be moved around and reordered. For instance, locks that backup and disable the current status of interrupts cannot (easily) be reordered.

It's possible to implement a closure-based lock on top of a guard lock, but not the other way around. E.G.

fn lock(&self, closure: F) {
    let guard = self.inner_lock.lock();
    closure(&mut *guard);
}

As such, I think a closure-based design is better.

@andre-richter
Copy link
Member

andre-richter commented Sep 10, 2019

because the destructor can be moved around and reordered.

You mean the compiler moving around the destructor?

@korken89
Copy link
Contributor Author

From my side there are 4 reasons why this should not be based on RAII, but a closure:

  1. @roblabla already touched on, the RAII version can be used to implement the closure and hence the closure can be implemented with an std-like mutex in the background.
  2. The closure is explicit where it starts and ends, no need to wait for the destruction or add extra { ... } around code to get predictable unlocks.
  3. Makes code using locks more stable under refactoring, as small changes can drastically change the scope of a lock when using RAII.
  4. A RAII based mutex can be mem::forget and all guarantees are out, the closure based mutex cannot be circumvented easily.

Lets continue discussing, I'll update the RFC with these thoughts after. :)

@roblabla
Copy link

roblabla commented Sep 10, 2019

@andre-richter I mean you can move around the Guard.

E.G.

let x = lock1.lock();
let y = lock2.lock();
drop(x);
drop(y);

Here, lock1 is locked before lock2, but unlocked first. This might work for some types of mutexes, but others will be hard (or impossible) to implement this way - such as locks that simply backup and disable interrupts on single-core systems. Closures ensure a hierarchy in the order in which mutexes are locked/unlocked.

@andre-richter
Copy link
Member

andre-richter commented Sep 10, 2019

Yeah, like I said, I'm in favor of closures as well. Thanks for providing the feedback so far. I just wanted to point out that users coming from standard world might be confused for a second but it's a price we can easily pay.

In the past I wanted to check/benchmark if RAII vs. closure has any (minimal) performance differences.

If somebody feels like writing a micro benchmark, this would be a nice opportunity now and potential content for the (dis)advantages section.

@korken89
Copy link
Contributor Author

After inlining the performance should be the same as they fundamentally do the same operations, but if someone wants to benchmark I can add the results.
We know from experiments in RTFM that the closures inline and disappear at least, but I have no data for RAII mutexes.

Another thing that came up with RAII based Mutex, if someone runs mem::swap on two mutexes - what does this mean?

@andre-richter
Copy link
Member

Yeah I was actually wondering if there are situations where the compiler would decide to call the closure as a function.

Agree that with inlining it is zero overhead.

We could put a note about what is needed (if any) to guarantee that inlining is done, or what should be avoided in a release build b/c it would defeat inlining.

@roblabla
Copy link

I've found https://github.com/japaric/cortex-m-rtfm/blob/2596ea0e46bec73d090d9e51d41e6c2f481b8e15/book/en/src/internals/critical-sections.md which explains in much finer details the reasoning behind &mut T for the mutex (with some clear examples of what it enables). Maybe this should get mentioned in the RFC.

@korken89
Copy link
Contributor Author

Thanks, I will add it for clarification

@andre-richter
Copy link
Member

andre-richter commented Sep 10, 2019

Follow up points from discussion on Matrix:

  • Trait bounds needed for indicating properties like single/multi-core usage, etc.

@therealprof
Copy link
Contributor

It think it should be an unsafe trait due to the extra care required by the implementer to implement it soundly, as mentioned in https://github.com/rust-lang/rfcs/blob/master/text/0019-opt-in-builtin-traits.md:

Unsafe traits: An unsafe trait is a trait that is unsafe to implement, because it represents some kind of trusted assertion. Note that unsafe traits are perfectly safe to use. Send and Share are examples of unsafe traits: implementing these traits is effectively an assertion that your type is safe for threading.

There's also precedence in the core lib about that (https://doc.rust-lang.org/src/core/str/pattern.rs.html#94) and a helpful comment outlining the reasoning:

/// The trait is marked unsafe because the indices returned by the
/// next() methods are required to lie on valid utf8 boundaries in
/// the haystack. This enables consumers of this trait to
/// slice the haystack without additional runtime checks.

@roblabla
Copy link

roblabla commented Sep 10, 2019

There's also precedence in the core lib about that (https://doc.rust-lang.org/src/core/str/pattern.rs.html#94) and a helpful comment outlining the reasoning:

There's a difference though. This trait is unsafe because other code is allowed to assume the returned value is on a valid utf8 boundary. If an implementation fails to respect this, UB and memory corruption are likely to occur. This requirement is not actually encoded in the type system, so unsafe is necessary.

In our case, the requirement is that the closure is "guaranteed exclusive access to the data contained within the mutex for the duration of the lock." But this is already guaranteed by the type system: We give the closure an &mut T, which is by definition an exclusive access to the data contained. The only way to not respect this is by using unsafe to create an (invalid) &mut pointer - at which point the user already acknowledged they were treading on unsafe grounds.

@perlindgren
Copy link

@roblabla , I'm not sure I understand. To implement the Mutex trait, should be unsafe (we have to take responsibility that the lock function gives exclusive access for the duration of the lock (in our case the closure). The clousure itself does not need to be unsafe to use, just to implement trait is unsafe.

/Per

@roblabla
Copy link

@perlindgren All of the requirements for safe usage of this trait are encoded by the type system. Safe usage of this trait requires that the closure has exclusive access to T. The closure gets a &mut T. An &mut T is an exclusive reference. It is impossible in safe code to create two live &mut T to the same T. The only way to violate this guarantee is through broken unsafe code.

If Mutex had other invariants that aren't encoded by the type-system, then it would need to be an unsafe trait as a purely safe implementation might not catch those invariants. But as it stands, Rust already has all the compile-time checks to ensure that Safe Rust cannot implement this trait incorrectly, and as such it should not be unsafe.

RTFM has an example of a completely safe implementation of Mutex called Exclusive: https://github.com/japaric/cortex-m-rtfm/blob/master/src/lib.rs#L311. It's trivial to know this implementation of Mutex is correct because the invariants that the trait must guarantee are encoded in the type system.

@Rahix
Copy link

Rahix commented Oct 1, 2019

Sorry for replying this late ...

As the overall reservation is on how a shared bus implementation works with this trait, we have provided a quick-and-dirty proof of concept implementation that can be used as reference. One can probably implement a shared bus in different ways, but this should highlight that there is no problem in using the current trait.

Yes, the idea you outlined in the code is what I had thought about as well. Though this only works for a shared-bus-type scenario where we can let the user create the mutex for us. In the "private-data"-case this would not work. But maybe we should try and gather some actual examples of drivers/libraries where this is the case to gauge whether this is actually an issue worth thinking about.

Another thing I'd like to talk about is the RTFM example you provided here. From my understanding of RTFM it is not realistic to create the driver instances in the tasks as that will mean it is recreated every time the task runs. Instead we'd want an implementation where the instances are created in init(). This would also be more flexible in regards to cases where access to the same driver from multiple tasks is needed.

I think such an implementation is not possible with a design close to the current shared-bus. I'd not worry about this too much though and keep shared-bus for use with Type A Mutexes / use-cases outside RTFM. For RTFM we can think about a different bus-sharing design. I have some ideas here but haven't tried anything yet.

Adding this constructor to the trait makes it less flexible.

I agree, adding the create() method to the main Mutex-trait would be a very bad idea. I like your

trait MutexCreate: Mutex {
    fn create() -> Self;
}

and would argue this should also be a part of core-mutex.

The latter allows for overloading the implementation and this can lead to inference errors.

Oh, I had completely overlooked this! In that case I agree, the associated type makes way more sense. Would you mind detailing this design decision in the RFC?

To levitate this one can implement a Ref struct which simply wraps the reference, giving the interface &mut Ref which could be easier to swallow.

I'd be much more comfortable with something going in this direction because the Ref-wrapper could protect against recursive use again by not implementing Copy. What about adding this type to core-mutex as well (the name might not be ideal)?

#[derive(Clone, Debug)]
struct MutexRef<M: Mutex + Clone>(M);

impl<M: Mutex + Clone> Mutex for MutexRef<M> {
    type Data = M::Data;

    fn lock<R>(&mut self, f: impl FnOnce(&mut Self::Data) -> R) -> R {
        self.0.lock(f)
    }
}

impl<M: Mutex + Clone + MutexCreate> MutexCreate for MutexRef<M> {
    fn create() -> Self {
        MutexRef(M::create())
    }
}

remember that this RFC aim to provide an interface to lock a mutex not an implementation

The Type A vs Type B stuff was meant to be about the interface the mutex provides, not its implementation. The point where the two meet is where some implementations are not compatible with some mutex APIs. Which is totally fine. I would just like to see all capabilities exposed for mutex implementations which have them. And in my opinion this needs to be part of core-mutex to prevent fragmentation of the eco-system.


My suggestion is this:

  • The current mutex trait is fine as is
  • Adding a MutexCreate trait for implementations supporting this
  • Also adding a MutexRef type for shareable mutexes

This would allow mapping like this:

  • A Type A Mutex is any type implementing Mutex<Data = T> + MutexCreate + Clone
  • A Type B Mutex is any type implementing Mutex<Data = T> + !MutexCreate + !Clone

@roblabla
Copy link

roblabla commented Oct 1, 2019

I agree, adding the create() method to the main Mutex-trait would be a very bad idea. I like your MutexCreate suggestion

Uuh, shouldn’t this take a Murex::Data? I expected something more like

trait MutexCreate: Mutex {
    fn create(data: Self::Data) -> Self;
}

Otherwise the mutex wouldn’t get initialized.

But this begs the question, why not simply use From<Mutex::Data>? Basically say that mutexes should implement From in order to allow easy generic construction.

IMO MutexCreate shouldn’t exist if it’s going to be a seperate trait. It’s redundant, Rust already has a plethora of construction traits.

@eldruin
Copy link
Member

eldruin commented Oct 2, 2019

I have an example that might be of interest here.
I think for some drivers like pcf857x (I/O expander) the MutexI2c would work, and furthermore, allow for the user to decide whether he needs the I2C accesses to be mutexed and pay the performance penalty or not, which offers great flexibility.

However, this would not fit as well when multiple steps are necessary for an operation like in the driver for xca9548a (hardware I2C switch/multiplexer, useful for setups where several slaves share an address, for example).
I recently implemented splitting for this device so that several virtual I2C devices are created and can be passed on to other drivers.

let parts = i2c_switch.split();
let mut my_driver = Driver::new(parts.i2c0);
let mut my_other_driver = Driver::new(parts.i2c1);

When interacting with these virtual devices, before doing the actual i2c.write, it is necessary to select the appropriate channel by communicating with the switch device. See here.
Passing a MutexI2c as input to this driver would adequately lock the access to the i2c resource, which is good, but leave a race condition between the channel selection and the actual write.
This can lead to a situation where we communicate with the wrong device.
On single-threaded applications this is all solved by a RefCell as all the operations are done together in a closure after acquiring a mutable borrow. On multi-threaded applications, however, an additional mutex is necessary at the same level of the current RefCell.

What would be the best way to provide the user with the freedom to say whether accesses need to be mutexed or not and if so, provide the concrete type for the Mutex?

@Rahix
Copy link

Rahix commented Oct 3, 2019

Uuh, shouldn’t this take a Murex::Data?

@roblabla: Wow, yes, how did I miss this?!

But this begs the question, why not simply use From<Mutex::Data>? Basically say that mutexes should implement From in order to allow easy generic construction.

Makes sense. Though I think adding a From<Mutex::Data> bound will look confusing for people who don't know all the details so I'd suggest also adding a 'trait-alias' to make it easy to see the intent:

trait MutexCreate: Mutex + From<Mutex::Data> { }

impl<M: Mutex + From<Mutex::Data>> MutexCreate for M { }

@Rahix
Copy link

Rahix commented Oct 3, 2019

@eldruin: We have the same issue in shared-bus with SPI right now, where there is a race-condition between setting the chip-select and locking the bus to do the transaction (See Rahix/shared-bus#8).

@korken89
Copy link
Contributor Author

Hi all,

I'd like to thank all for all their input into this issue! We are going to start FCP now (1 week long), with the following:

  • The RFC will have a minor update to comments and unless there are issues will be accepted as is
  • The additions recommended by @Rahix and others that has been extensively discusses will be moved into the proposed crate - this to not need to go through voting again with this RFC (most discussed features only needs to go through official approval)

Thanks again for enlightening discussions!

@Rahix
Copy link

Rahix commented Oct 28, 2019

Something we have not yet talked about at all is that the current design does not provide any way to acquire a mutex in a non-blocking fashion. This makes the mutex unsuitable for use in ISRs and similar contexts which are not allowed to block under any circumstances. It would probably be a good idea to include a second lock method that immediately returns if the mutex cannot be acquired. That is:

pub trait Mutex {
    type Data;

    fn lock<R>(&mut self, f: impl FnOnce(&mut Self::Data) -> R) -> R;

    // Attempt locking the mutex but return immediately if this is not possible
    fn try_lock<R>(&mut self, f: impl FnOnce(&mut Self::Data) -> R) -> nb::Result<R, ()>;
}

@korken89
Copy link
Contributor Author

@Rahix This is by design and ties back to the original fallible vs infallible discussion. An fallible mutex can be implemented on top of this, however this RFC is specifically for an infallible mutex as we discussed earlier.

In the continuation of this we will look into expanding these traits and implementations and then this direction should have its own RFC/issue together with a specific discussion on only this to not miss important things as can happen if we make one RFC too large.

When we have the base crate up I will make issues for each of the extra features and ideas here and link in each person connected to it so each extension can get its own discussion.
Thanks for bringing it up!

@Rahix
Copy link

Rahix commented Oct 29, 2019

This is by design and ties back to the original fallible vs infallible discussion. An fallible mutex can be implemented on top of this, however this RFC is specifically for an infallible mutex as we discussed earlier.

The way I understand the fallible vs. infallible discussion, an infallible mutex is just defined to always be able to acquire the mutex eventually. There are no guarantees about when that will happen and there might very well be a considerable time between start of the locking attempt and actual acquisition. That means, there might be a time where the caller is blocked, waiting on the current mutex holder to release it.

This behavior makes the lock() function unusable in contexts where blocking is absolutely forbidden. For example, in an ISR or a FreeRTOS timer callback. For such contexts, a function is needed that returns immediately if the mutex is currently blocked by someone else. This does not mean that our mutex type was fallible, it just means the mutex is currently in use elsewhere and will be unblocked some (finite) time in the future.


As a parallel, take a look at the mutex implementation in the Linux kernel: mutex_lock() is defined like

void mutex_lock(struct mutex *lock);

The void return type means that when the function returns, the mutex is now under all circumstances owned by the current task. Thus, locking a mutex is infallible, just like it is with our mutex trait. This does not, however, forbid mutex_lock() from rescheduling until the mutex is freed by the current owner.

But for IRQs, tasklets and other contexts where rescheduling or any other kind of blocking is not allowed, there is a second

int mutex_trylock(struct mutex *lock);

function that returns immediately whether it was able to acquire the mutex or not. This function is now safe for any kind of context.


In the continuation of this we will look into expanding these traits

I fear that standardizing a trait and expanding it later will be very painful as we saw with the OutputPin. We will need version-hacks again for backwards-compatibility and it will take a very long time until all implementations update to the new upstream. While we can't guarantee our current design to be flawless in any way, we can at least resolve any discussions which are brought up before standardization (even if it means to decide on postponing the feature).

@japaric
Copy link
Member

japaric commented Nov 4, 2019

the current design does not provide any way to acquire a mutex in a non-blocking fashion

Can you say more about the use case you have in mind for this generic
try_lock API? The Mutex trait is for inter-operation of crates via generic
code. How would a driver crate use this try_lock API, for instance, and how
would it be consumed by an application? (I)


I think I should say at this point -- not to anyone in particular -- that a
trait is not a good way to "standardize an API". Library authors are free to
implement or not any particular trait (e.g. try_lock, as proposed,
semantically makes zero sense for RTFM resources); announcing a trait is not
going to make the ecosystem grow crates that implement it; etc.

Traits (not extension traits, though) are mainly a code reuse mechanism; they
let you for example combine any driver (of M) with any HAL impl (of N) without
having to write M * N implementations -- only M + N implementations need to
be written.


use in ISRs and similar contexts which are not allowed to block under any circumstances

IMO, in this case one should look for and use a particular mutex that has the
desired properties (for example RTFM mutexes are non-blocking; the entry and
exit code always executes in constant time) or the desired API (e.g.
lock_timeout) rather than use generic code or pick a type based on the traits
it implements.

To give an example, under this try_lock proposal would it be OK to use
SomeDriver<M> where M: Mutex in ISR / must-not-block context? How do I know if
SomeDriver is internally using the lock API or the try_lock API? I can't
tell from the signature; I would have to look at the implementation and if I
make a decision based on that then I'm relying on implementation details that
may change at anytime because they are not subject to semver. (II)

From this example alone, I would not add try_lock to the Mutex trait but
rather create a separate trait. Also, I should add that
SomeDriver<rtfm::Mutex> would be a good fit for must-not-block context under
the main proposal which does not include the try_lock API.

As a parallel, take a look at the mutex implementation in the Linux kernel

I don't think this is a good comparison. This Linux API is a concrete function /
API, not a generic API with many implementations -- there's no code reuse there;
the whole kernel uses the same mutex type. This is a bit like saying we should
create a trait X only because there's a concrete type X in the Rust
standard library -- we should instead consider what code reuse benefits come
from creating such trait.

I fear that standardizing a trait and expanding it later will be very painful
as we saw with the OutputPin.

Very different scenario. OutputPin v2 has deprecated v1. The hypothetical
try_lock is meant to live alongside the lock method. Adding a separate
try_lock trait requires no compatibility wrappers or bridge types.


This try_lock idea also seems somewhat close to an "async" mutex (see
futures::lock::Mutex). Such mutex is also non-blocking but you get a
user-space context switch (at least when using it in a tokio context) instead of
blocking behavior (e.g. thread::sleep). Probably not something you want to call
from a ISR / time-sensitive context but the concept of an async mutex should also be
evaluated as a potential additional trait -- it seems very unlikely that a
single type is going to implement lock, try_lock and async_lock at the
same time, or even any two of these methods. (III)


In conclusion, I don't think a try_lock method should be added to the Mutex
trait (see II). It may make sense to have a separate trait for try_lock
depending on how useful it is for code reuse (based on the answer to I). It may
also be worth to think about an async_lock trait but this is still far away
given the state of no_std async and impl Trait in trait methods. (see III)

@jamesmunns
Copy link
Member

jamesmunns commented Nov 5, 2019

Not 100% sure what @Rahix had in mind, but I can see a failable mutex being useful in two situations:

  1. Non-critical "concurrent" tasks
  2. A multicore system

For point 1, I can see why RTFM is advantageous, and can be considered to "obsolete" these needs, but for simple use cases, it may be useful to occasionally synchronize data without caring too much if we can't always obtain the mutex. For example if there are two "tasks" which both want to access the I2C bus (one to read a temperature sensor, once a second, the other to read an O2 sensor, once every 100ms), it would be nice to be able to simply wrap this in a non-blocking mutex, and just skip update periods when necessary. This isn't suitable for hard realtime or timing critical use cases, but sometimes it's okay to skip a sensor update every now and then if the bus is busy.

For the second use case (multicore), I could see a shared memory buffer that is protected by some kind of spinlock or basic CAS primitive for the purposes of a sharing data between two systems. We don't want to necessary block one core while the other is writing data, particularly if one is just seeing if any messages are pending.

@japaric
Copy link
Member

japaric commented Nov 5, 2019

I can see a failable mutex being useful in two situations:

I agree with these use cases. There are certainly uses cases for try_lock mutexes as a concept. My question was about a try_lock trait in particular; how would a trait help with these use cases? What APIs benefit from being written in a generic fashion?

In these two examples I would just go to crates.io and shop for a mutex with the right semantics not caring much about the API (the "shape"; i.e. closure API vs guard-based API).

@korken89
Copy link
Contributor Author

korken89 commented Nov 5, 2019

There is no issue in having a try_lock if we decide to have a trait for it in the future, I think we are starting to focus on the wrong part now.
The Mutex trait is not to have all possible methods in it, but to get the bare essential part and then expand with more traits as we find them needed.

However every extension we propose should have its own RFC to get proper focus and discussion as this is a very fundamental core area which is very easy to get wrong if we do not thoroughly think it through.

@thejpster
Copy link
Contributor

So I think I've been convinced that any scenario requiring a failable mutex (like round-robin processing of co-operative tasks which greedily acquire resources and hold on to them beyond their run-time as some hardware operation - e.g. DMA - is on-going in the background) can be resolved by using some other type (e.g. an atomic counter) and a infallible mutex. I agree with @japaric that try_lock shouldn't be in this trait.

@almindor
Copy link
Contributor

almindor commented Nov 6, 2019

For what it's worth I also think having a separate Trait for try_lock is the way to go. The main problem with that approach is fragmentation of implementations where I think a lot of them would not implement the "extensions traits" and cause compatibility issues accross devices/drivers.

I don't think there's a good solution to that however, even if there was a nice standard chances are people would still only implement the minimum enforced functionality so it becomes a question of what's considered required functionality for a Mutex in general.

@glandium
Copy link

glandium commented Nov 6, 2019

Can someone please answer #377 (comment) ?

@jamesmunns
Copy link
Member

We've had a lot of discussion, and decided in the meeting this is ready to merge.

Thanks all!

bors r+

@bors
Copy link
Contributor

bors bot commented Nov 19, 2019

👎 Rejected by label

@jamesmunns jamesmunns removed the needs-decision This RFC or PR needs to be approved by the majority of reviewers before it's merged label Nov 19, 2019
@jamesmunns
Copy link
Member

bors retry

bors bot added a commit that referenced this pull request Nov 19, 2019
377: [RFC] Add a Mutex trait r=jamesmunns a=korken89

cc @perlindgren @japaric 

[Rendered](https://github.com/korken89/wg/blob/master/rfcs/0377-mutex-trait.md)

Co-authored-by: Emil Fresk <[email protected]>
@bors
Copy link
Contributor

bors bot commented Nov 19, 2019

Build succeeded

@bors bors bot merged commit 461415e into rust-embedded:master Nov 19, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.